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Abstract 

In the Internet era the information overload and the challenge to detect quality 
content has raised the issue of how to rank both resources and users in online com- 
munities. In this paper we develop a general ranking method that can simultaneously 
evaluate users' reputation and objects' quality in an iterative procedure, and that ex- 
ploits the trust relationships and social acquaintances of users as an additional source 
of information. We test our method on two real online communities, the EconoPhysics 
forum and the Last.fm music catalogue, and determine how different variants of the 
algorithm influence the resultant ranking. We show the benefits of considering trust 
relationships, and define the form of the algorithm better apt to common situations. 

1 Introduction 

Nowadays, ranking techniques and reputation systems are widely employed in e-commerce 
services, where buyers and sellers may give each other a score after a completed transaction — 
encouraging good behavior in the long term [1] . Other reputation systems are content-based, 
in the sense that users are evaluated by their contribution [2]. In the field of search engines, 
PageRank [3], the most successful algorithm for ranking web pages, is basically a random 
walk process on the directed graphs of websites and hyperlinks. HITS (Hyperlink-Induced 
Topic Search @]), a predecessor of PageRank, instead assigns to web pages two different 
scores: as hub and as authority. Thanks to this twofold nature of the score, HITS was later 
generalized [5] to bipartite graphs, an important class of systems where entities are divided 
in two disjoint sets such that interactions happen only between entities in different sets. 
Examples of systems modeled by bipartite graphs include reviewers and movies in rental 
websites, scientists and papers in citation networks, customers and products in e-commerce 
services, and so on. In these systems, each set is endowed with only one kind of score, 
and if the two sets consist of users and objects it is natural to associate these scores with 
reputation and quality, respectively. However, bipartite networks are often embedded in the 
social network of the participant users: for instance, in websites like Digg.com or Last.fm, 
users can select other users as their friends; also, citation networks are naturally influenced 
by the professional relationships among scientists. The underlying social network represent 
an additional source of information a ranking algorithm may exploit, as social links can 
be associated with trust relationships among users. This is similar to recently proposed 
recommendation techniques that make use of social ties to obtain recommendations [5J. 

In this work, we propose a novel and generalized ranking algorithm for bipartite systems 
to assign quality values to objects and reputation values to users. Such method, which 
we name QTR (Quality, Trust and Reputation), also exploits the information coming from 
the users' social relationships. QTR is a generalized algorithm in the sense that it can be 
easily adapted to different situations (e.g. by giving more weight to certain kind of actions, 
or to a particular behavior of users). We test our method on two different datasets, the 
EconoPhysics forum online community and the Last.fm online radio and social network, 
which are particularly suited for our generalized algorithm — as will be explained later. The 
results of our study are twofold. We first confirm that ranking is a difficult task, and that an 
improper algorithm or a peculiarly-structured dataset can lead to extremely biased results. 
Hence we propose a form of the QTR which is efficient in avoiding such bias. In addition, 
we show that social relationships can play a valuable role in improving the quality of the 
ranking. 



The rest of this paper is organized as follows. Section 2 presents the QTR ranking 
method, including its relation with HITS. Section 3 reports the description of each dataset 
used for testing, followed by the results of our analysis. We conclude with possible general- 
izations of the QTR algorithm in section 4. 

2 Generalized QTR algorithm 

Before presenting our ranking method, we describe the underlying bipartite system and 
introduce some notations. A bipartite network consists of two disjoint sets of entities (nodes), 
which for convenience we name as users (labeled by latin letters, i = 1, . . . , N) and objects 
(labeled by greek letters, a = l,...,Af). An interaction between user i and object a is 
represented by a link connecting the two. Using the formalism of the adjacency matrix, 
we say that ai a equals 1 if an interaction has occurred, and otherwise. More generally, 
such interaction can be represented by a weighted link Wi a , where the strength of the link 
depends on which particular interaction has occurred or how important/demanding that 
interaction was. We can further define the degree of user i as the number of objects that 
user has interacted with: fc, = ^2 a aia, and the degree of object a as the number of users 
who interacted with it: k a — X)j a «cc The total weight of user i is instead defined as 
k Y = E Q w ia, and of object a as = Wia . 

Aside from the bipartite network, users interact with each other in a monopartite social 
network, where we say that the adjacency matrix element &y equals 1 if user i is a friend 
of user j or trusts user j (note that in the first case the matrix is symmetric, whereas it is 
not in the second). As before, we can introduce a weighted link T^- which represents the 
"amount of trust" user i puts in user j. The number of friends of/users who trust user j is 
fj = J^i bij! whereas the total weight of user j is fj — J^i ^ij- 

2.1 Definition and Interrelation of Quality and Reputation 

We shall now define the meaning of the ranking scores the algorithm assigns to objects and 
users: quality and reputation, respectively. 

Quality it is not an inherent property of an object, rather it is constructed through 
interactions of the community with the object itself. Reputation represents the general 
opinion of the community towards a user, hence it is ascribed by others and assessed on the 
basis of the user's actions. We use these conceptual definitions to write down the equations 
for the QTR ranking method: 

1 N 

Qa = -g^^W ia [Ri~ p R R] (1) 
^ M M 

* = p.rE^^-^ + T^E^ --pbMTm-ptT} (2) 

i a=l U j = l 

where Q = E Q Qa/M, R = ^ R,/N, f = ^ T r] /[N(N - 1)] are the average values of 
quality, reputation and trust in the community, and 9q, 9r, 6t, Pq, Pr, Pt are control 
parameters — all varying in the range [0, 1]. 

Since equations ([I]) and ^ are mutually interconnected, quality and reputation values 
can be determined iteratively. Starting with evenly distributed scores = 1/VM Va, 
r\ Q) — 1/y/N Vi, values of quality and reputation at iteration step n + 1 are computed from 
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the values at the previous step n by: 



1 N 



M M 

To avoid divergence, normalization is applied at the end of each step so that: 

M N 

£[^ +1) ] 2 = 1 E^ ( " +1) ] 2 = 1 

a.— l i—1 

The iterative procedure stops when the algorithm converges to a stationary state: 

M N 



E i^ n+1) - ^ n) i + E - < 5 

i=l 



The QTR algorithm just introduced is a generalization of the notorious HITS algorithm 
for bipartite graphs [5], namely: 

Qa = E w ia R i R i = E W i<*Qa ( 3 ) 

QTR reduces to standard HITS when all parameters 9q, Or, 6t, Pq, Ph., Pt go to zero 
and Tij = Vi,j. However we shall see in what follows that, although making analytical 
treatment hard, these parameters are extremely valuable in controlling the outcome of the 
ranking, and that trust represents additional information which is worth to consider. 



3 Experimental results 

In this section we test the QTR algorithm on two different datasets, the EconoPhysics 
forum community and the Last.fm online radio and social network. We present the rankings 
obtained for objects and users for different values of the model parameters. In order to 
better describe our results, we make use of Pearson correlation coefficients between various 
pairs of quantities: Q a — k a {c Qk ), Q a — (c Qw ), Ri — fc, (c Rk ), Ri — fcf (c Rw ), and only 
for Last.fm Ri — /, (cr/). 

3.1 EconoPhysics Forum 

The EconoPhysics Forum (http://unifr.ch/econophysics/) is an online platform for inter- 
disciplinary collaboration between physicists and social scientists. Users of the forum can 
share different resources related to econophysics and complexity science. In what follows, 
we will consider as objects only the papers uploaded to the forum. As a consequence, a user 
action can be either uploading, downloading, or viewing a paper. To obtain the dataset 
of interactions, we analyzed the forum's weblogs dating from 6th July 2010 until 1st June 
2012. We removed all entries corresponding to web bots (which cause approximately 75% of 
the traffic) and repeated access (a user viewing/downloading the same paper several times). 
We also removed all papers uploaded before 6th July 2010 (for which we do not know the 
uploader) and all actions associated with them. Finally, we removed the users who both did 
not upload any paper and have only one view or download action. Altogether, our refined 
data contains 3511 users, 597 papers and 19578 links. 

Among the three types of users' access considered (uploading, downloading and viewing 
a paper), the first is obviously the more demanding, whereas the second reflects the user's 
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Fi gUie 1; Probability distributions of Q (left) and R (right) values in EconoPhysics forum data for different 
configurations of the QTR algorithm. Insets: prolongation to negative values. 



Table 1: Top-2 papers (top) and users (bottom) obtained by different configurations of QTR for EconoPhysics 
forum data. Top papers are: 295 (A. Storkcy, Machine Learning Markets, 2011), 102 (R. Tsekov, Brownian 
markets, 2010), 260 (T. Prcis, Switching processes in financial markets, 2011), 263 (T. Prcis Econophysics - complex 
correlations and trend switchings in financial time scries, 2011), 525 (M. Hisakado, Two kinds of Phase transitions 
in a Voting model, 2012) and 530 (A. Zeng, Enhancing network robustness for malicious attacks, 2012). 
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interest in the paper much better than the latter. Hence we can associate to each action a 
different weight. In what follows, we set w = 1.0 for upload actions, w — 0.1 for download 
actions and w — 0.05 for view actions. Of course this is just a particular choice, which 
we consider as reasonable, and we are going to investigate different weighting system in 
future works. We are also currently running an online survcjQ to determine how these 
different actions are perceived by scientists — this will allow for a more justified choice of 
the weights. In any case, the freedom to chose the particular set of weight^ is what makes 
the EconoPhysics dataset an ideal candidate for testing QTR, despite it does not contain 
information about users' social or trust relationships. 

We test the QTR algorithm on these data with different values of the parameters 9q, 
9r, Pq, Pr, Pt- Since social relationships are absent (Ty = Vi,j), 6t and pt are 
meaningless. The particular form of the algorithm under consideration will be labeled by 
the parameter values used: for instance, 0000 means 9q = 0, 9r = 0, pq = 0, pn = (which 
corresponds to standard HITS). Apart from HITS, we make use of other three configurations: 
1100, 0110, 1111 (we do not use 0011 as it shows convergence problems). 0110 was chosen 
instead of 1001 as in our opinion is more reasonable to penalize high-degree users than high- 
degree objects. Figure [T] shows the probability distributions of Q and R values generated by 

Available at http:/ /surveys. soh.surrey.ac.uk/limcsurvcy/indcx.php?sid=14327S£lang=en 

2 We remark that one is nevertheless constrained to a region of the weight space, because if some action(s) 

become dominant then the other(s) lose their significance and the graph becomes much sparser (as there 

were only dominant actions). 
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Table 2: Correlation coefficients obtained by different configurations of the QTR algorithm on the EconoPhysics 
forum data. 
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QTR, and Table [T] the top-2 users and papers for each configuration. For R, we immediately 
notice that one extremely high value (very close to one) is present in all cases. In 0000, 
the top user is the system administrator, which is the uploader of many papers, and this 
is why all his uploads get the same (high) score — the algorithm is not able to distinguish 
between them. In both 1100 and 1111, top users have very low degree and this is also an 
undesirable feature: a single good action shouldn't be enough to obtain high reputation. 
At the same time, top papers here are very recent works that attracted the attention of 
a few highly-reputed users. In 0110 finally we obtain the best situation where the scores 
are distributed more evenly, top users have a non negligible number of contributions and 
top papers have on average more citations than in the other settings (although we do not 
consider citation count as a perfect benchmark for quality) . Table [2] further shows that this 
is the only case in which CQk and cq w are positive, whereas CRk and cr w are close to 0. 

3.2 Last.fm 

Last.fm ( |http://www.last.fm/| ) is a music website which records details of the songs users 
listen to (form Internet radio stations, personal computers and portable music devices), and 
provide them with personalized recommendations. The site also offers a social networking 
features, in which users can become friends with each other and join groups. The dataset 
we analyzed is available onlin^] and was generated by the Information Retrieval Group at 
Universidad Autonoma de Madrid [7]. It contains 1892 users, 17632 artists, 92834 artist 
listening records and 12717 bi-directional friend relations. A peculiar feature of the data is 
that the users' degree is almost always equal to 50. This is because Last.fm service is free 
for users in UK, US and Germany, but users in other countries require a subscription to use 
the radio service and have to pay a fee after a 50 track free trial. 

Since the artist listening records from users are labeled by the total listening counts, 
the weighting system for the bipartite network comes out automatically. Instead the social 
network only contains the friendship relation of the users. In order to have the two terms 
in the sum of equation pi of the same magnitude, we set Tjj = w (k/f ) whenever a,j = 1, 
and Tij = otherwise (here w is the average of all weights in the bipartite network, k is 
the average users' degree in the bipartite network and / is the average users' degree in the 
monopartite social network). Within this framework, px loses its meaning while 9t does 
not. To be consistent with the previous analysis, we set here pr = @t = and use the same 
configurations as before. To better illustrate the role of trust, we consider both the cases in 
which = Vi,j ("without trust") and T tj ^ ("with trust"). 

Figures [2] and [3] show the probability distributions of Q and R values, and Tables [3] and [4] 
the top-2 users and artists for each configuration. When trust is not taken into account, we 
notice again the presence of isolated and extremely high values, especially for Q (this effects 
is less evident for 0110 and also for 0000 here — because of the absence of an overwhelmingly 
active user/popular artist). However if trust is considered, scores become distributed more 
evenly for each QTR configuration. Table [5] gives additional confirmations of the benefit 
brought by considering trust: without trust, cqu and cq w are positive only for 0000 and 
0110 (the latter is better), and cr/ is always — as expected; with trust, CQt and cq w grow 

3 http:/ /www. grouplens.org/node/462 
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Figure 2: Probability distributions of Q (left) and R. (right) values in Last.fm data for different setting of the 
QTR algorithm, and when trust is not taken into account. 
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FigLir6 3; Probability distributions of Q (left) and R (right) values in Last.fm data for different setting of the 
QTR algorithm, and when trust is taken into account. 
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Ta,bl6 3l Top-2 artists (top) and users (bottom) obtained by different configurations of QTR for Last.fm data 
when trust is not taken into account. Top artists arc: 72 (Dcpcchc Mode), 1072 (Martin L. Gore), 289 (Britney 
Spears), 89 (Lady Gaga), 792 (Thalia) and 2390 (Monica Naranjo). 
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Table 4 : Top-2 artists (top) and users (bottom) obtained by different configurations of QTR for Last.fm data when 
trust is taken into account. New top artists: 292 (Christina Aguilcra), 6373 (Tyler Adam) and 18121 (Rytmus). 
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slightly for 0000 and considerably for 0110, whereas c^f is now always close to 1 as it should 
be — users with many friends/trusted by many should be indeed highly reputed. We remark 
that the effect of trust can be tuned by adjusting the weights of the friend relationships. 

4 Further generalizations 

In this section we discuss two further generalizations of the QTR algorithm, which will be 
studied and tested in future works. 

4.1 Time decay 

Bipartite systems and their related social networks are not static but instead evolve in 
time. This means that new users can join the community, whereas other users who are 
already members may become inactive after a while. On the other hand, newly appeared 



Table 5 '. Correlation coefficients obtained by different configurations of the QTR algorithm on the Last.fm 
datasct. 
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objects can become hits in almost no time, whereas old objects usually end up losing their 
attractiveness. Because of these features, a ranking algorithm should be able to handle time 
effects, for instance by avoiding giving high score to objects which were very popular in 
the past but whose relevance is currently negligible, or by giving low scores to users who 
were reliable in the past but then started to behave badly. We can hence introduce in the 
equations a decaying function of time D(r): 

1 N 

Qa® = ji-^^WtolRiit) - p R Rit)]D{ ria ) (4) 

i—1 

1 M 

Mt) = e + -j^^^w iu [Q a (t)-p Q Q(t)]D(T ia ) 



a=l 

M 



+ 777^ X>i(*) ~ PRR(tWji(t) - p T f{t)]D{n 3 ) (5) 

where t is the current time, Ti a = t — t{ a is the age of the interaction of user i and object 
a, Tij = t — Uj is the age of the trust relationship between users i and j, and e is the 
small positive reputation assigned to new members of the community (who do not have any 
interaction yet). The decay function D(t) can have non-zero tail even when t is large, and 
the strength of the decay can be tuned to focus on a particular time window. Some examples 
of decay function include D(i) = [1 + (i/To)' 3 ] -1 or D(t) = do + (1 — do) exp[— £/tq], where 
tq is the characteristic time scale of decay. 



4.2 Projected trust 

Trust is the subjective opinion of one user towards another. We argue that, when no explicit 
assessments from users are available, trust relationships can be inferred form the bipartite 
network by measuring the similarity of users' actions, which essentially means by projecting 
the bipartite user-object network into the monopartite user-user network: 

- [R 3 (t)- PR R{t)] ™ [Q a (t) - PQ Q(t)} 

= k~W R ^ W ** W 3>» k f t \8 Q D(Ti a )D[Tj a ) (6) 

We name this term as "projected" trust. Despite the fact that projected trust values are 
computed with the same source of information used for quality and reputation assessment, 
preliminary results (not reported here) show that using T instead of T values in a slightly 
modified version of the algorithm can bring to some improvements with respect to simple 
HITS, especially when the bipartite network is sparse. 



5 Conclusion 

In this work we introduced a general ranking method for bipartite networks that can si- 
multaneously evaluate users' reputation and objects' quality. This is by no means the first 
attempt in the literature [5] |8j [9] , however our method differs from the others by exploiting 
the trust relationships and social acquaintances of users as an additional source of informa- 
tion. Testing of our method on real datasets revealed which form of the algorithm gives more 
reasonable results. In addition, we showed that considering trust relationships indeed brings 
improvements to the resultant ranking. The positive results we obtained are encouraging. 
However, the number of parameters used by the algorithm, and in general the difficulties in 
assessing the reliability of a ranking method pose additional issues on the effectiveness of 
our method, which will require further tests and future studies. 
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